27 research outputs found

    Exploring Metaphorical Senses and Word Representations for Identifying Metonyms

    Full text link
    A metonym is a word with a figurative meaning, similar to a metaphor. Because metonyms are closely related to metaphors, we apply features that are used successfully for metaphor recognition to the task of detecting metonyms. On the ACL SemEval 2007 Task 8 data with gold standard metonym annotations, our system achieved 86.45% accuracy on the location metonyms. Our code can be found on GitHub.Comment: 9 pages, 8 pages conten

    Geocoding location expressions in Twitter messages: A preference learning method

    Get PDF
    Resolving location expressions in text to the correct physical location, also known as geocoding or grounding, is complicated by the fact that so many places around the world share the same name. Correct resolution is made even more difficult when there is little context to determine which place is intended, as in a 140-character Twitter message, or when location cues from different sources conflict, as may be the case among different metadata fields of a Twitter message. We used supervised machine learning to weigh the different fields of the Twitter message and the features of a world gazetteer to create a model that will prefer the correct gazetteer candidate to resolve the extracted expression. We evaluated our model using the F1 measure and compared it to similar algorithms. Our method achieved results higher than state-of-the-art competitors

    Recognizing Extended Spatiotemporal Expressions by Actively Trained Average Perceptron Ensembles

    Full text link
    Precise geocoding and time normalization for text requires that location and time phrases be identified. Many state-of-the-art geoparsers and temporal parsers suffer from low recall. Categories commonly missed by parsers are: nouns used in a non- spatiotemporal sense, adjectival and adverbial phrases, prepositional phrases, and numerical phrases. We collected and annotated data set by querying commercial web searches API with such spatiotemporal expressions as were missed by state-of-the- art parsers. Due to the high cost of sentence annotation, active learning was used to label training data, and a new strategy was designed to better select training examples to reduce labeling cost. For the learning algorithm, we applied an average perceptron trained Featurized Hidden Markov Model (FHMM). Five FHMM instances were used to create an ensemble, with the output phrase selected by voting. Our ensemble model was tested on a range of sequential labeling tasks, and has shown competitive performance. Our contributions include (1) an new dataset annotated with named entities and expanded spatiotemporal expressions; (2) a comparison of inference algorithms for ensemble models showing the superior accuracy of Belief Propagation over Viterbi Decoding; (3) a new example re-weighting method for active ensemble learning that 'memorizes' the latest examples trained; (4) a spatiotemporal parser that jointly recognizes expanded spatiotemporal expressions as well as named entities.Comment: 10 page

    A genetic investigation of sex bias in the prevalence of attention-deficit/hyperactivity disorder

    Get PDF
    Background Attention-deficit/hyperactivity disorder (ADHD) shows substantial heritability and is 2-7 times more common in males than females. We examined two putative genetic mechanisms underlying this sex bias: sex-specific heterogeneity and higher burden of risk in female cases. Methods We analyzed genome-wide autosomal common variants from the Psychiatric Genomics Consortium and iPSYCH Project (20,183 cases, 35,191 controls) and Swedish populationregister data (N=77,905 cases, N=1,874,637 population controls). Results Genetic correlation analyses using two methods suggested near complete sharing of common variant effects across sexes, with rg estimates close to 1. Analyses of population data, however, indicated that females with ADHD may be at especially high risk of certain comorbid developmental conditions (i.e. autism spectrum disorder and congenital malformations), potentially indicating some clinical and etiological heterogeneity. Polygenic risk score (PRS) analysis did not support a higher burden of ADHD common risk variants in female cases (OR=1.02 [0.98-1.06], p=0.28). In contrast, epidemiological sibling analyses revealed that the siblings of females with ADHD are at higher familial risk of ADHD than siblings of affected males (OR=1.14, [95% CI: 1.11-1.18], p=1.5E-15). Conclusions Overall, this study supports a greater familial burden of risk in females with ADHD and some clinical and etiological heterogeneity, based on epidemiological analyses. However, molecular genetic analyses suggest that autosomal common variants largely do not explain the sex bias in ADHD prevalence

    MapSearch: a protocol and prototype application to find maps

    No full text
    Even geographers need ways to find what they need among the thousands of maps buried in map libraries and in journal articles. It is not enough to provide search by region and keyword. Studies of queries show that people often want to look for maps showing a certain location at a certain time period or with a subject theme. The difficulties in finding such maps are several. Maps in physical and digital collections often are organized by region. Multi-dimensional manual indexing is time-consuming and so many maps are not indexed. Further, maps in non-geographical publications are indexed rarely, making them essentially invisible. In an attempt to solve actual problems, this dissertation research automatically indexes maps in published documents so that they become visible to searchers. The MapSearch prototype aggregates journal components to allow finer-grained searching of article content. MapSearch allows search by region, time, or theme as well as by keyword (http://scilsresx.rutgers.edu/~gelern/maps/). Automatic classification of maps is a multi-step process. A sample of 150 maps and the text (that becomes metadata) describing the maps have been copied from a random assortment of journal articles. Experience taking metadata manually enabled the writing of instructions to mine data automatically; experience with manual classification allowed for writing algorithms that classify maps by region, time and theme automatically. That classification is supported by ontologies for region, time and theme that have been generated or adapted for the purpose and that allow what has been called intelligent search, or smart search. The 150 map training set was loaded into the MapSearch engine repeatedly, each time comparing automatically-assigned classification to manually-assigned classification. Analysis of computer misclassifications suggested whether the ontology or classification algorithm should be modified in order to improve classification accuracy. After repeated trials and analyses to improve the algorithms and ontologies, MapSearch was evaluated with a set of 55 previously unseen maps in a test set. Automated classification of the test set of maps was compared to the manual classification, with the assumption that the manual process provides the most accurate classification obtainable. Results showed an accuracy, or a correspondence between manual and automated classification, of 75% for region, 69% for time, and 84% for theme. The dissertation contributes: (1) a protocol to harvest metadata from maps in published articles that could be adapted to aggregate other sorts of journal article components such as charts, diagrams, cartoons or photographs, (2) a method for ontology-supported metadata processing to allow for improved result relevance that could be applied to other sorts of data, (3) algorithms to classify maps into region, time and theme facets that could be adapted to classify other document types, and (4) a proof-of-concept MapSearch system that could be expanded with heterogeneous map types.Ph.D.Includes bibliographical references (p. 117-125)

    Use of Ontologies for Data Integration and Curation

    Get PDF
    Data curation includes the goal of facilitating the re-use and combination of datasets, which is often impeded by incompatible data schema. Can we use ontologies to help with data integration? We suggest a semi-automatic process that involves the use of automatic text searching to help identify overlaps in metadata that accompany data schemas, plus human validation of suggested data matches. Problems include different text used to describe the same concept, different forms of data recording and different organizations of data. Ontologies can help by focussing attention on important words, providing synonyms to assist matching, and indicating in what context words are used. Beyond ontologies, data on the statistical behavior of data can be used to decide which data elements appear to be compatible with which other data elements. When curating data which may have hundreds or even thousands of data labels, semi-automatic assistance with data fusion should be of great help. 1
    corecore